A Tight Prediction Interval for False Discovery Proportion under Dependence
نویسندگان
چکیده
The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false discovery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from existing methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.
منابع مشابه
Predicting False Discovery Proportion Under Dependence
We present a flexible framework for predicting error measures in multiple testing situations under dependence. Our approach is based on modeling the distribution of the probit transform of the p-values by mixtures of multivariate skew-normal distributions. The model can incorporate dependence among p-values and it allows for shape restrictions on the p-value density. A nonparametric Bayesian sc...
متن کاملFalse discovery control for multiple tests of association under general dependence
We propose a confidence envelope for false discovery control when testing multiple hypotheses of association simultaneously. The method is valid under arbitrary and unknown dependence between the test statistics and allows for an exploratory approach when choosing suitable rejection regions while still retaining strong control over the proportion of false discoveries.
متن کاملEstimating the Proportion of Nonzero Normal Means under Certain Strong Covariance Dependence by
The proportion of certain type of hypotheses is a key component of adaptive false discovery procedures in multiple testing. To date, a good estimator of the proportion of false null hypotheses under dependence is lacking. For multiple testing normal means, we develop a (uniformly) consistent estimator of the proportion of nonzero normal means when the dependent test statistics follow a joint no...
متن کاملSLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures
MOTIVATION The pre-estimate of the proportion of null hypotheses (π(0)) plays a critical role in controlling false discovery rate (FDR) in multiple hypothesis testing. However, hidden complex dependence structures of many genomics datasets distort the distribution of p-values, rendering existing π(0) estimators less effective. RESULTS From the basic non-linear model of the q-value method, we ...
متن کاملEstimating False Discovery Proportion Under Arbitrary Covariance Dependence.
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging u...
متن کامل